Alert volume normalization in a video surveillance system

ABSTRACT

Techniques are disclosed for normalizing and publishing alerts using a behavioral recognition-based video surveillance system configured with an alert normalization module. Certain embodiments allow a user of the behavioral recognition system to provide the normalization module with a set of relative weights for alert types and a maximum publication value. Using these values, the normalization module evaluates an alert and determines whether its rareness value exceed a threshold. Upon determining that the alert exceeds the threshold, the module normalizes and publishes the alert.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation application of co-pending U.S. patentapplication Ser. No. 13/836,730, filed on Mar. 15, 2013, granted as U.S.Pat. No. 9,349,275, which itself claims benefit of U.S. ProvisionalPatent Application Ser. No. 61/611,284, filed Mar. 15, 2012, which isherein incorporated by reference.

BACKGROUND

1. Field of the Invention

Embodiments of the invention disclosed herein generally relate totechniques for reporting anomalous behavior to users of a behavioralrecognition-based video surveillance system. More specifically,embodiments of the invention provide a framework for normalizing thenumber of alerts generated for multiple disjoint alert types.

2. Description of the Related Art

Some currently available video surveillance systems provide simpleobject recognition capabilities. For example, a video surveillancesystem may be configured to classify a group of pixels (referred to as a“blob”) in a given frame as being a particular object (e.g., a person orvehicle). Once identified, a “blob” may be tracked from frame-to-framein order to follow the “blob” moving through the scene over time, e.g.,a person walking across the field of vision of a video surveillancecamera. Further, such systems may be configured to determine when anobject has engaged in certain predefined behaviors. For example, thesystem may include definitions used to recognize the occurrence of anumber of predefined events, e.g., the system may evaluate theappearance of an object classified as depicting a car (a vehicle-appearevent) coming to a stop over a number of frames (a vehicle-stop event).Thereafter, a new foreground object may appear and be classified as aperson (a person-appear event) and the person then walks out of frame (aperson-disappear event). Further, the system may be able to recognizethe combination of the first two events as a “parking-event.”

However, such surveillance systems typically require that the objectsand/or behaviors which may be recognized by the system be defined inadvance. Thus, in practice, these systems rely on predefined definitionsfor objects and/or behaviors to evaluate a video sequence. Unless theunderlying system includes a description for a particular object orbehavior, the system is generally incapable of recognizing that behavior(or at least instances of the pattern describing the particular objector behavior). More generally, such systems rely on predefined rules andstatic patterns and are thus often unable to dynamically identifyobjects, events, behaviors, or patterns, much less even classify them aseither normal or anomalous.

Moreover, end users of these rules-based surveillance systems typicallyspecify events which should result in an alert. However, this poses aproblem in practice because a typical rule-based surveillance systemgenerates, on average, thousands of alerts per day and per camera, and auser presented with a numerous amount of alerts becomes unable todiscern which alerts are of high importance. Thus, these rules-basedsystems are of limited usefulness with regard to notifying a user ofimportant security alerts.

SUMMARY

One embodiment of the invention provides a method for normalizing andpublishing alerts using a behavioral recognition system configured withnormalization module. This method may generally include receiving analert having a type and an original rareness value and converting theoriginal rareness value to an alert percentile value. This method mayalso include normalizing and publishing the alert upon determining thatthe alert percentile value is greater than an alert percentilethreshold.

Other embodiments include, without limitation, a computer-readablemedium that includes instructions that enable a processing unit toimplement one or more aspects of the disclosed methods as well as asystem having a processor, memory, and application programs configuredto implement one or more aspects of the disclosed methods.

BRIEF DESCRIPTION OF THE DRAWINGS

So that the manner in which the above recited features, advantages, andobjects of the present disclosure are attained and can be understood indetail, a more particular description, briefly summarized above, may behad by reference to the embodiments illustrated in the appendeddrawings.

It is to be noted, however, that the appended drawings illustrate onlytypical embodiments of this invention and are therefore not to beconsidered limiting of its scope, for the disclosure may admit to otherequally effective embodiments.

FIG. 1 illustrates components of a video analysis system, according toone embodiment.

FIG. 2 further illustrates components of the video analysis system shownin FIG. 1, according to one embodiment.

FIG. 3 illustrates an alert that a behavioral recognition system maygenerate, according to one embodiment.

FIG. 4 illustrates a method for processing alerts within a behavioralrecognition system configured with a normalization module, according toone embodiment.

FIG. 5 illustrates a method for publishing a normalized alert, accordingto one embodiment.

FIG. 6 illustrates graphical representations of an example specific usecase of an alert normalization module in a behavioral recognitionsystem, according to one embodiment.

DETAILED DESCRIPTION

Embodiments of the invention disclosed herein provide a framework fornormalizing the number of alerts generated from multiple disjoint alerttypes in a behavioral recognition system. The disclosed frameworkprovides statistical consistency to different alert types and ensuresthat the behavioral recognition system presents a relatively certainnumber of alerts to the user regardless of the number of alert typesavailable in the system.

A behavioral recognition system may be configured to learn, identify,and recognize patterns of behavior by observing a sequence of individualframes, otherwise known as a video stream. Unlike a rules-based videosurveillance system, which contains predefined patterns of what toidentify, the behavioral recognition system disclosed herein learnspatterns by generalizing input and building memories of what isobserved. Over time, the behavioral recognition system uses thesememories to distinguish between normal and anomalous behavior within thefield of view captured within a video stream. Generally, the field ofview is referred to as the “scene.”

In one embodiment, the behavioral recognition system includes a computervision engine and a machine learning engine. The computer vision enginemay be configured to process a scene, generate information streams ofobserved activity, and then pass the streams to the machine learningengine. In turn, the machine learning engine may be configured to learnobject behaviors in that scene, build models of certain behaviors withina scene, and determine whether observations indicate that the behaviorof an object is anomalous, relative to the model.

In one embodiment, the machine learning engine may support multiplealert types triggered by a variety of different behavioral patterncategories, such as activity, motion, speed, velocity, and trajectory.Similarly, other alert types may depend on interactions between objects,including collision and position. Alert types learn normal behaviors inthe scene and generate alerts on abnormal activities. A rules-basedvideo surveillance system notifies a user to anomalies that the userspecifies, while a behavioral recognition system notifies a user towhatever the system identifies as anomalous.

However, a behavioral recognition system may generate a large volume ofalerts. Additionally, a behavioral system may include a large variety ofalert types, and occurrences of one alert type may arise at a differentfrequency from occurrences of another alert type. Although there aresome similarities in alerts at an abstract level, alert types are mostlydisjoint in their behavioral recognition characteristics. For instance,the anomaly model of a high velocity alert type may differ greatly froma model of a high acceleration alert type. Further, given the relativeoccurrence rate of each alert type, the distribution of rare alerts willdiffer across alert types.

Therefore, to avoid the overall number of alerts from overwhelming auser, and to select which alerts to publish, a behavioral recognitionsystem may be configured with an alert normalization module. In oneembodiment, a user may provide a desired alert publication rate and aset of relative weights for the alert types supported by the system. Inanother embodiment, a user may also provide a desired dispatch rate.Alert publication generally refers to the behavioral recognition systempublishing an alert to an interface, where it may be viewed and actedupon by an operator, and alert dispatch generally refers to thebehavioral alert system notifying the user of an alert, e.g., by sendinge-mail or by a generating display on a graphical user interface. Forexample, a user may specify to the behavioral recognition system topublish one hundred alerts per day, distributed with equal relativeweights across alert types. In such a case, the alert normalizationmodule may evaluate a distribution of previously published alerts, e.g.,over the last seven days, for each alert type to identify a distributionfor each alert type. Once the distribution is determined, a normalizedrareness threshold, expected to result in the correct number of alertsfor that type, may be calculated. The alert normalization module may usethese threshold values in determining which alerts to present to theuser.

In one embodiment, the machine learning engine processes informationfrom observations made by the computer vision engine. For example, acamera focused on a parking lot may record cars passing through thescene, and the machine learning engine may process events correspondingto cars passing through the scene for alert types such as high speed,low speed, and abnormal trajectory. For each event, the machine learningengine assigns a rareness value for each alert type. The machinelearning engine may discard alerts with a rareness value that fallsbelow a threshold. In turn, the machine learning engine may generate analert for alert types with a high rareness values (i.e., those greaterthan a threshold) as alerts through the alert normalization module.

The normalization module receives an alert from the machine learningengine and converts the alert's rareness value into an alert percentile.An alert percentile is a value that is based on an alert's rarenessvalue compared to the rareness values of historical alerts of that alerttype. Once the normalization module converts the rareness value into analert percentile, the normalization module compares the alert percentileto a percentile threshold value and discards the alert if the percentilevalue falls below the threshold. If the percentile value is above thethreshold, then the normalization module converts the percentile valueinto a normalized alert rareness value. This value is placed into acomposite normalized rareness histogram that provides the module withdata to assign a numerical publication rank for the alert. Afterassigning a rank to the alert, the normalization module publishes thealert with the rank value. For example, a behavioral recognition systemconfigured to publish one hundred alerts per day publishes an unusualtrajectory alert of rank twelve.

In the following, reference is made to embodiments of the invention.However, it should be understood that the invention is not limited toany specifically described embodiment. Instead, any combination of thefollowing features and elements, whether related to differentembodiments or not, is contemplated to implement and practice what isdisclosed. Furthermore, in various embodiments the present inventionprovides numerous advantages over the prior art. However, althoughembodiments may achieve advantages over other possible solutions and/orover the prior art, whether or not a particular advantage is achieved bya given embodiment is not limiting. Thus, the following aspects,features, embodiments and advantages are merely illustrative and are notconsidered elements or limitations of the appended claims except whereexplicitly recited in a claim(s). Likewise, any reference to “theinvention” or “the disclosure” shall not be construed as ageneralization of any inventive subject matter disclosed herein andshall not be considered to be an element or limitation of the appendedclaims except where explicitly recited in a claim(s).

One embodiment of the present invention is implemented as a programproduct for use with a computer system. The program(s) of the programproduct defines functions of the embodiments (including the methodsdescribed herein) and can be contained on a variety of computer-readablestorage media. Examples of computer-readable storage media include (i)non-writable storage media (e.g., read-only memory devices within acomputer such as CD-ROM or DVD-ROM disks readable by an optical mediadrive) on which information is permanently stored; (ii) writable storagemedia (e.g., floppy disks within a diskette drive or hard-disk drive) onwhich alterable information is stored. Such computer-readable storagemedia, when carrying computer-readable instructions that direct thefunctions of the present disclosure, are embodiments of the presentdisclosure. Other examples media include communications media throughwhich information is conveyed to a computer, such as through a computeror telephone network, including wireless communications networks.

In general, the routines executed to implement the embodiments of thepresent disclosure may be part of an operating system or a specificapplication, component, program, module, object, or sequence ofinstructions. The computer program of the present disclosure iscomprised typically of a multitude of instructions that will betranslated by the native computer into a machine-readable format andhence executable instructions. Also, programs are comprised of variablesand data structures that either reside locally to the program or arefound in memory or on storage devices. In addition, various programsdescribed herein may be identified based upon the application for whichthey are implemented in a specific embodiment of the disclosure.However, it should be appreciated that any particular programnomenclature that follows is used merely for convenience, and thus theinvention should not be limited to use solely in any specificapplication identified and/or implied by such nomenclature.

FIG. 1 illustrates components of a video analysis and behavioralrecognition system 100, according to one embodiment of the presentinvention. As shown, the behavioral recognition system 100 includes avideo input source 105, a network 110, a computer system 115, and inputand output devices 118 (e.g., a monitor, a keyboard, a mouse, a printer,and the like). The network 110 may transmit video data recorded by thevideo input 105 to the computer system 115. Illustratively, the computersystem 115 includes a CPU 120, storage 125 (e.g., a disk drive, opticaldisk drive, floppy disk drive, and the like), and a memory 130containing both a computer vision engine 135 and a machine learningengine 140. As described in greater detail below, the computer visionengine 135 and the machine learning engine 140 may provide softwareapplications configured to analyze a sequence of video frames providedby the video input 105.

Network 110 receives video data (e.g., video stream(s), video images, orthe like) from the video input source 105. The video input source 105may be a video camera, a VCR, DVR, DVD, computer, web-cam device, or thelike. For example, the video input source 105 may be a stationary videocamera aimed at a certain area (e.g., a subway station, a parking lot, abuilding entry/exit, etc.), which records the events taking placetherein. Generally, the area within the camera's field of view isreferred to as the scene. The video input source 105 may be configuredto record the scene as a sequence of individual video frames at aspecified frame-rate (e.g., 24 frames per second), where each frameincludes a fixed number of pixels (e.g., 320×240). Each pixel of eachframe may specify a color value (e.g., an RGB value) or grayscale value(e.g., a radiance value between 0-255). Further, the video stream may beformatted using known such formats e.g., MPEG2, MJPEG, MPEG4, H.263,H.264, and the like.

As noted above, the computer vision engine 135 may be configured toanalyze this raw information to identify active objects in the videostream, identify a variety of appearance and kinematic features used bya machine learning engine 140 to derive object classifications, derive avariety of metadata regarding the actions and interactions of suchobjects, and supply this information to the machine learning engine 140.And in turn, the machine learning engine 140 may be configured toevaluate, observe, learn and remember details regarding events (andtypes of events) that transpire within the scene over time.

In one embodiment, the machine learning engine 140 receives the videoframes and the data generated by the computer vision engine 135. Themachine learning engine 140 may be configured to analyze the receiveddata, cluster objects having similar visual and/or kinematic features,build semantic representations of events depicted in the video frames.Over time, the machine learning engine 140 learns expected patterns ofbehavior for objects that map to a given cluster. Thus, over time, themachine learning engine learns from these observed patterns to identifynormal and/or abnormal events. That is, rather than having patterns,objects, object types, or activities defined in advance, the machinelearning engine 140 builds its own model of what different object typeshave been observed (e.g., based on clusters of kinematic and orappearance features) as well as a model of expected behavior for a givenobject type. Thereafter, the machine learning engine can decide whetherthe behavior of an observed event is anomalous or not based on priorlearning.

Data describing whether a normal/abnormal behavior/event has beendetermined and/or what such behavior/event is may be provided to outputdevices 118 to issue alerts, for example, an alert message presented ona GUI interface screen. Further, output devices 118 may be configured toallow a user to specify the amount of alerts to publish over a giventime period. For example, a user may use a GUI interface to configurethe behavioral recognition system 100 to publish 100 alerts per day.

In general, the computer vision engine 135 and the machine learningengine 140 both process video data in real-time. However, time scalesfor processing information by the computer vision engine 135 and themachine learning engine 140 may differ. For example, in one embodiment,the computer vision engine 135 processes the received video dataframe-by-frame, while the machine learning engine 140 processes dataevery N-frames. In other words, while the computer vision engine 135 mayanalyze each frame in real-time to derive a set of kinematic andappearance data related to objects observed in the frame, the machinelearning engine 140 is not constrained by the real-time frame rate ofthe video input.

Note, however, FIG. 1 illustrates merely one possible arrangement of thebehavior-recognition system 100. For example, although the video inputsource 105 is shown connected to the computer system 115 via the network110, the network 110 is not always present or needed (e.g., the videoinput source 105 may be directly connected to the computer system 115).Further, various components and modules of the behavior-recognitionsystem 100 may be implemented in other systems. For example, in oneembodiment, the computer vision engine 135 may be implemented as a partof a video input device (e.g., as a firmware component wired directlyinto a video camera). In such a case, the output of the video camera maybe provided to the machine learning engine 140 for analysis. Similarly,the output from the computer vision engine 135 and machine learningengine 140 may be supplied over computer network 110 to other computersystems. For example, the computer vision engine 135 and machinelearning engine 140 may be installed on a server system and configuredto process video from multiple input sources (i.e., from multiplecameras). In such a case, a client application 250 running on anothercomputer system may request (or receive) the results over network 110.

FIG. 2 further illustrates components of the computer vision engine 135and the machine learning engine 140 first illustrated in FIG. 1,according to one embodiment of the invention. As shown, the computervision engine 135 includes a data ingestor 205, a detector 215, atracker 215, a context event generator 220, an alert generator 225, andan event bus 230. Collectively, the components 205, 210, 215, and 220provide a pipeline for processing an incoming sequence of video framessupplied by the video input source 105 (indicated by the solid arrowslinking the components). In one embodiment, the components 210, 215, and220 may each provide a software module configured to provide thefunctions described herein. Of course, one of ordinary skill in the artwill recognize that the components 205, 210, 215, and 220 may becombined (or further subdivided) to suit the needs of a particular caseand further that additional components may be added (or some may beremoved) from a video surveillance system.

In one embodiment, the data ingestor 205 receives video input from thevideo input source 105. The data ingestor 205 may be configured topreprocess the input data before sending it to the detector 210. Thedetector 210 may be configured to separate each frame of video providedinto a stationary or static part (the scene background) and a collectionof volatile parts (the scene foreground). The frame itself may include atwo-dimensional array of pixel values for multiple channels (e.g., RGBchannels for color video or grayscale channel or radiance channel forblack and white video). In one embodiment, the detector 210 may modelbackground states for each pixel using an adaptive resonance theory(ART) network. That is, each pixel may be classified as depicting sceneforeground or scene background using an ART network modeling a givenpixel. Of course, other approaches to distinguish between sceneforeground and background may be used.

Additionally, the detector 210 may be configured to generate a mask usedto identify which pixels of the scene are classified as depictingforeground and, conversely, which pixels are classified as depictingscene background. The detector 210 then identifies regions of the scenethat contain a portion of scene foreground (referred to as a foreground“blob” or “patch”) and supplies this information to subsequent stages ofthe pipeline. Additionally, pixels classified as depicting scenebackground may be used to generate a background image modeling thescene.

In one embodiment, the detector 210 may be configured to detect the flowof a scene. Once the foreground patches have been separated, thedetector 210 examines, from frame-to-frame, any edges and corners of allforeground patches. The detector 210 will identify foreground patchesmoving in a similar flow of motion as most likely belonging to a singleobject or a single association of motions and send this information tothe tracker 215.

The tracker 215 may receive the foreground patches produced by thedetector 210 and generate computational models for the patches. Thetracker 215 may be configured to use this information, and eachsuccessive frame of raw-video, to attempt to track the motion of anobject depicted by a given foreground patch as it moves about the scene.That is, the tracker 215 provides continuity to other elements of thesystem by tracking a given object from frame-to-frame. It furthercalculates a variety of kinematic and/or appearance features of aforeground object, e.g., size, height, width, and area (in pixels),reflectivity, shininess rigidity, speed velocity, etc.

The context event generator 220 may receive the output from other stagesof the pipeline. Using this information, the context processor 220 maybe configured to generate a stream of context events regarding objectstracked (by tracker component 210). For example, the context eventgenerator 220 may package a stream of micro feature vectors andkinematic observations of an object and output this to the machinelearning engine 140, e.g., at a rate of 5 Hz. In one embodiment, thecontext events are packaged as a trajectory. As used herein, atrajectory generally refers to a vector packaging the kinematic data ofa particular foreground object in successive frames or samples. Eachelement in the trajectory represents the kinematic data captured forthat object at a particular point in time. Typically, a completetrajectory includes the kinematic data obtained when an object is firstobserved in a frame of video along with each successive observation ofthat object up to when it leaves the scene (or becomes stationary to thepoint of dissolving into the frame background). Accordingly, assumingcomputer vision engine 135 is operating at a rate of 5 Hz, a trajectoryfor an object is updated every 200 milliseconds, until complete. Thecontext event generator 220 may also calculate and package theappearance data of every tracked object by evaluating the object forvarious appearance attributes such as shape, width, and other physicalfeatures and assigning each attribute a numerical score.

The computer vision engine 135 may take the output from the components205, 210, 215, and 220 describing the motions and actions of the trackedobjects in the scene and supply this information to the machine learningengine 140 through the event bus 230. Illustratively, the machinelearning engine 140 includes a classifier 235, a semantic module 240, amapper 245, cognitive module 250, a cortex module 270, and anormalization module 265.

The classifier 235 receives context events such as appearance data fromthe computer vision engine 135 and maps the data on a neural network. Inone embodiment, the neural network is a combination of a self-organizingmap (SOM) and an ART network, shown in FIG. 2 as a SOM-ART network 236.The data is clustered and combined by features occurring repeatedly inassociation with each other. Then, based on those recurring types, theclassifier 235 defines types of objects. For example, the classifier 235may define foreground patches that have, for example, a high shininessrigidity and reflectivity as a Type 1 object. These defined types thenpropagate throughout the rest of the system.

The cortex module 270 receives kinematic data from the computer visionengine 135 and maps the data on a neural network, shown in FIG. 2 asSOM-ART network 271. In one embodiment, SOM-ART network 271 clusterskinematic data to build common sequences of events in a scene. Inanother embodiment, SOM-ART network 271 clusters kinematic data frominteracting trajectories to build common interactions in a scene. Bylearning common sequences of events and interactions within the scene,the cortex module 270 aids the machine learning engine in detectinganomalous sequences and interactions.

The mapper 240 uses these types by searching for spatial and temporalcorrelations and behaviors across the system for foreground patches tocreate maps of where and when events are likely or unlikely to happen.In one embodiment, the mapper 240 includes a temporal memory ART network241, a spatial memory ART network 242, and statistical engines 243. Forexample, the mapper 240 may look for patches of Type 1 objects. Thespatial memory ART network 242 uses the statistical engines 243 tocreate statistical data of these objects, such as where in the scene dothese patches appear, in what direction do these patches tend to go, howfast do these patches go, whether these patches change direction, andthe like. The mapper 240 then builds a neural network of thisinformation, which becomes a memory template against which to compareobject behaviors. The temporal memory ART network 241 uses thestatistical engines 243 to create statistical data based on samplings oftime slices. In one embodiment, initial sampling occurs at every thirtyminute interval. If many events occur within a time slice, then the timeresolution may be dynamically changed to a finer resolution. Conversely,if fewer events occur within a time slice, then the time resolution maybe dynamically changed to a coarser resolution.

In one embodiment, the semantic module 245 includes a phase spacepartitioning component 246 and an anomaly detection component 247. Thesemantic module 245 identifies patterns of motion or trajectories withina scene and analyzes the scene for anomalous behavior throughgeneralization. By tessellating a scene and dividing the foregroundpatches into many different tessera, the semantic module 245 traces anobject's trajectory and learns patterns from the trajectory. Thesemantic module 245 analyzes these patterns and compares them with otherpatterns. As objects enter a scene, the phase space partitioningcomponent 246 builds an adaptive grid and maps the objects and theirtrajectories onto the grid. As more features and trajectories arepopulated onto the grid, the machine learning engine learns trajectoriesthat are common to the scene and further distinguishes normal behaviorfrom anomalous behavior.

In one embodiment, the cognitive module 250 includes a perceptual memory251, an episode memory 252, a long term memory 253, a workspace 254, andcodelets 255. Generally, the workspace 254 provides a computationalengine for the machine learning engine 140. For example, the workspace240 may be configured to copy information from the perceptual memory251, retrieve relevant memories from the episodic memory 252 and thelong-term memory 253, select which codelets 255 to execute. In oneembodiment, each codelet 255 is a software program configured toevaluate different sequences of events and to determine how one sequencemay follow (or otherwise relate to) another (e.g., a finite statemachine). More generally, the codelet may provide a software moduleconfigured to detect interesting patterns from the streams of data fedto the machine learning engine. In turn, the codelet 255 may create,retrieve, reinforce, or modify memories in the episodic memory 252 andthe long-term memory 253. By repeatedly scheduling codelets 255 forexecution, copying memories and percepts to/from the workspace 240, themachine learning engine 140 performs a cognitive cycle used to observe,and learn, about patterns of behavior that occur within the scene.

In one embodiment, the perceptual memory 251, the episodic memory 252,and the long-term memory 253 are used to identify patterns of behavior,evaluate events that transpire in the scene, and encode and storeobservations. Generally, the perceptual memory 251 receives the outputof the computer vision engine 135 (e.g., a stream of context events).The episodic memory 252 stores data representing observed events withdetails related to a particular episode, e.g., information describingtime and space details related on an event. That is, the episodic memory252 may encode specific details of a particular event, i.e., “what andwhere” something occurred within a scene, such as a particular vehicle(car A) moved to a location believed to be a parking space (parkingspace 5) at 9:43 AM.

In contrast, the long-term memory 253 may store data generalizing eventsobserved in the scene. To continue with the example of a vehicleparking, the long-term memory 253 may encode information capturingobservations and generalizations learned by an analysis of the behaviorof objects in the scene such as “vehicles tend to park in a particularplace in the scene,” “when parking vehicles tend to move a certainspeed,” and “after a vehicle parks, people tend to appear in the sceneproximate to the vehicle,” etc. Thus, the long-term memory 253 storesobservations about what happens within a scene with much of theparticular episodic details stripped away. In this way, when a new eventoccurs, memories from the episodic memory 252 and the long-term memory253 may be used to relate and understand a current event, i.e., the newevent may be compared with past experience, leading to bothreinforcement, decay, and adjustments to the information stored in thelong-term memory 253, over time. In a particular embodiment, thelong-term memory 253 may be implemented as an ART network and asparse-distributed memory data structure. Importantly, however, thisapproach does not require the different object type classifications tobe defined in advance.

In one embodiment, modules 235, 240, 245, 250, and 270 include ananomaly detection component, as depicted by components 237, 244, 247,256, and 272. Each module may be configured to identify anomalousbehavior, relative to past observations of the scene. If any moduleidentifies anomalous behavior, its corresponding anomaly detectorcomponent generates an alert and passes the alert through thenormalization module 265. For instance, anomaly detector 247 in thesemantic module 245 detects unusual trajectories using learned patternsand models. If a foreground object exhibits loitering behavior, forexample, anomaly detector 247 evaluates the object trajectory usingloitering models, subsequently generates an alert, and sends the alertto the normalization module 265. Upon receiving an alert, thenormalization module 265 evaluates whether the alert should bepublished.

FIG. 3 illustrates an example alert 300 generated by the behavioralrecognition system, according to one embodiment. As shown, the alert 300includes a description 305, category 310, a type 315, and a rarenessvalue 320. Of course, an alert may include additional data. Thedescription 305 may include information about the alert, such as whenthe event occurred (i.e. relative to a time index of a scene) and wherethe event occurred (i.e., relative to the coordinates of a foregroundobject in a scene) or how long the event took place. The category 310 isan identifier that corresponds to a behavioral pattern. Velocity,trajectory, and motion are all examples of alert categories. A type 315corresponds to an alert type. High velocity, sequence trajectory, andanomalous motion are all examples of alert types.

In one embodiment, a rareness value 320 ranges between 0 and 1 andreflects how common the occurrence is relative to past occurrencescorresponding to a certain alert type. A value of 0 represents the mostcommon (or least rare) event, while a value of 1 represents the leastcommon (or rarest) event. The machine learning engine assigns rarenessvalues for each alert type to all occurrences observed by the behavioralrecognition system. In a behavioral recognition system that is newlydeployed, the machine learning engine may initially assign high rarenessvalues to events in a scene (because the machine learning engine isprocessing information on newly observed behavioral patterns), but overtime, the machine learning engine assigns rareness values moreaccurately after learning more behavioral patterns.

FIG. 4 illustrates a method 400 for processing alerts within abehavioral recognition system configured with an alert normalizationmodule, according to one embodiment. The method begins at step 405,where the computer vision engine processes an event in a scene and sendsthe information (e.g., appearance and kinematics data of the foregroundobject) to the machine learning engine. For example, a camera focused aparking garage may record a car driving at a speed of thirty-five milesper hour. The computer vision engine generates information, such as dataof the car's trajectory and speed, and passes it to the machine learningengine. At step 410, the machine learning assigns the behavior arareness value to each alert type. In this example, because the machinelearning engine typically evaluates cars (i.e., a classification modelrepresenting a car) moving at a speed of ten miles per hour through thescene, the system may assign a rareness value of 0.85 to a high speedalert type. At step 415, the machine learning engine determines whetherthe event corresponds to anomalous behavior (e.g., through one of themachine learning engine's anomaly detection components). If not, thesystem discards the event. Otherwise, if the event is anomalous, thenthe system generates an alert for the event at step 420. The machinelearning engine sends the alert with its rareness value to thenormalization module. In the ongoing example, the machine learningengine passes the observation of the car driving at thirty-five milesper hour as an alert through the normalization module if the systemfinds such behavior anomalous.

FIG. 5 illustrates a method 500 for normalizing and publishing an alertin a behavioral recognition system. For this method, presume that a userhas specified to the normalization module a maximum publication rate(i.e., a number of alerts to publish per day) and a set of relativeweights per alert type. As shown, the method begins at step 505, wherethe normalization module receives an alert from an anomaly detectioncomponent in the machine learning engine. For example, the anomalydetection component in the semantic module may generate an unusualtrajectory alert and send it to the normalization module. At step 510,the normalization module converts the alert's rareness value to an alertpercentile value.

After calculating the alert percentile, at step 515, the normalizationmodule evaluates whether the alert percentile is greater than apercentile threshold. The percentile threshold is calculated using avalue for the normalization module's maximum allowed alert counts for analert type and an estimated value for alert counts of the next day. Inone embodiment, the maximum allowed alert counts for an alert type i,represented in the following equation as v_(i), may be determined asfollows:

$\begin{matrix}{{v_{i} = \frac{\omega_{i}P}{\sum_{i = 1}^{M}\omega_{i}}},} & (1)\end{matrix}$

where ω_(i) represents the relative publication weight given for thatalert type, P represents the desired publication rate, and M representsthe total number of alert types in the behavioral recognition system.Further, in one embodiment, the value for alert counts of the next day,represented in the following equation as N_(i), may be estimated asfollows:

$\begin{matrix}{{N_{i} = \frac{\sum_{k = 1}^{B}{\alpha_{k}n_{k}}}{\sum_{k = 1}^{B}\alpha_{k}}},} & (2)\end{matrix}$

where B represents a historical buffer (in days), n_(k) represents alertcounts observed on the k^(th) day in the past, and a represents a set ofrelative weights for each daily alerts count in the historical buffer.Using the maximum allowed volume for an alert type and the next-dayalert counts value, the percentile threshold, represented in thefollowing equation as ξ_(i), may be calculated as follows:

$\begin{matrix}{\xi_{i} = {1 - {\frac{v_{i}}{N_{i}}.}}} & (3)\end{matrix}$

In one embodiment, the normalization module updates the percentilethreshold on a daily basis using historical alert percentile values.

Upon determining whether an alert percentile is greater than thepercentile threshold (step 515), the normalization module converts thealert percentile value to a composite normalized rareness value. Thenormalization module does this by estimating the rareness of the alertrelative to its own distribution within the alert type. In oneembodiment, a normalized rareness value, represented here as η_(i), maybe obtained through this formula:

$\begin{matrix}{{\eta_{i} = \frac{ɛ_{i} - \xi_{i}}{1 - \xi_{i}}},} & (4)\end{matrix}$

where τ_(i) represents the alert percentile value. This approach ensuresstatistical consistency across values of multiple disjoint alert types.For example, a normalized rareness value of 0.9 for alerts of twodifferent alert types have the same statistical rarity and may beregarded as having the same importance despite their underlying anomalymodels being different.

The normalization module populates the normalized rareness value of thealert into a composite normalized rareness histogram. The compositenormalized rareness histogram provides the normalization module withdata to create a publication rank for the alert. Using this data and thegiven publication and dispatch rates, the normalization module computesthe alert's publication rank (step 525). In one embodiment, the equationto calculate a certain publication rank β for a given alert i is asfollows:

β=min(P,N _(p))*(1−η)  (5),

where N_(p) is a rank-renormalization constant that is estimated bycomputing the maximum value of historical daily-published alerts for thelast B number of days. Alerts with high normalized rareness values havea lower publication rank than alerts with low values. The normalizationmodule publishes the alert in order of publication rank (step 530). Inanother embodiment, a user may configure the alert normalization moduleto dispatch (e.g., by sending e-mail or by generating display on agraphical user interface) a certain number of alerts. In such a case,the normalization module dispatches the alert only if the publicationrank is less than the maximum dispatch number.

The normalization module is unable to normalize the alert using equation(4) in cases where the alert's original rareness value is equal to 1.Instead, in one embodiment, η_(i) may be obtained through this formula:

$\begin{matrix}{{\eta_{i} = {{1 - {{rand}\frac{m + {\sum_{k = 1}^{B}m_{k}}}{p + {\sum_{k - 1}^{B}p_{k}}}\mspace{14mu} {if}\mspace{14mu} r}} = 1}},} & (6)\end{matrix}$

where rand (x) denotes a uniform random number in [0, x], m is a currentnumber of alerts having a rareness value of 1 observed for the day, p isa current number of published alerts for the day, and r is the alert'soriginal rareness value.

In another embodiment, the normalization module may calculate N_(i) toaccount for differing alert type volumes on specific days of the week.For example, the machine learning engine may, on a weekly basis,consistently generate more alerts on a Friday than on a Monday, and auser may want the normalization module to estimate the total number ofcounts for a given day using the counts of that day a week ago. Tohandle day-specific normalization, N_(i) may be calculated as:

$\begin{matrix}{{N_{i} = {{\theta \frac{\sum\limits_{i = 1}^{B}\; {\alpha_{i}\gamma_{i}}}{\sum\limits_{i = 1}^{B}\; \alpha_{i}}} + {\left( {1 - \theta} \right)\frac{\sum\limits_{i = 1}^{B}\; \omega_{id}}{B}}}},} & (7)\end{matrix}$

where θ is a multiplier that represents the weight given to dailycomposites, γ represents the number of object-specific compositesobserved for the current day, and d represents the day for which thecount needs to be estimated.

Further, in another embodiment, the normalization module adjusts thepercentile threshold value in cases of alert overshoot. Alert overshootoccurs in situations where the behavioral recognition system observesmore anomalous events in a day than anticipated, resulting in a lowerthan percentile threshold and thus more alerts crossing the thresholdvalue. The normalization module adapts to this by increasing thepercentile threshold toward 1 using an overage value λ. In oneembodiment, the adjustment is represented in the following formula:

$\begin{matrix}{{\xi_{i} = {{\frac{N_{i} + p_{i} - {\left( {1 + \lambda} \right)v_{i}}}{N_{i}}\mspace{14mu} {if}\mspace{14mu} p_{i}} > v_{i}}},} & (8)\end{matrix}$

where p_(i) represents the number of published alerts for the currentday. Note, however, that the normalization module does not modify thepercentile threshold if the number of published alerts falls below themaximum number of volume of alerts allowed by the system for that alerttype, and further note that the normalization module places an upperbound on the number of alerts that any alert type can publish.

FIG. 6 illustrates a graphical representation of an example specific usecase of the alert normalization module. For the purpose of this examplecase, a user sets a desired publication rate of 100 alerts per day, adesired dispatch rate of five alerts per day, and equal relative weightsfor all alert types in a system where there are four overall alerttypes. The figure represents the publication and dispatch of alertsthrough the four alert types in a given day, where each alert type isexpected to publish a maximum of twenty-five alerts. The graphs on theleft side of the figure (605, 610, 615, and 620) show the representativedistribution of alerts from all alert types. For simplicity, continuousdistributions are shown here and the number of alerts observed withinthe day are considered to match exactly with the estimated next-dayalerts based on the historical daily-alerts data. The lined portions ofeach graph represent alerts that will be published. Note that the toptwenty-five alerts from the first three alert types with the highestoriginal rareness values are published, as well as all ten of the alertsfrom the last alert type. The alerts that are not published do notsatisfy the percentile threshold.

Note that all of the alerts to be published from the individual alerttypes are automatically uniformly distributed with their normalizedrareness values across the composite normalized rareness histogram 625,displayed on the right side of the figure. The normalization module usesdata in the histogram to calculate the publication ranks of the alertsto be published. In one embodiment, the computed rank is further used todecide if any particular alert should be dispatched. More specifically,all eighty-five alerts coming from individual alert types are publishedwith publication ranks between 0 and 85. Further note that the smallerthe rank, the more importance the published alert has, and thus thenormalization module will dispatch all alerts below the desired dispatchrate, in this case alerts with a rank of below 5.

As described, embodiments of the present invention provide a frameworkfor normalizing the number of alerts generated for multiple disjointalert types in a behavioral recognition-based video surveillance system.By using a desired alert publication rate and relative weights ofdifferent alert types, a normalization module receives an alert having acertain rareness value and converts this value to a percentile. If thepercentile is greater than a threshold value, then the module normalizesthe alert and publishes the alert to the user. Advantageously, thisapproach brings statistical consistency across rareness values ofdifferent alert types and ensures the publication of a relativelycertain number of alerts regardless of the number of alert types withinthe system.

While the foregoing is directed to embodiments of the presentdisclosure, other and further embodiments of the disclosure may bedevised without departing from the basic scope thereof, and the scopethereof is determined by the claims that follow.

What is claimed is:
 1. A method for normalizing and publishing an alertgenerated by a behavioral recognition system, the method comprising:receiving an alert having a type and an original rareness value,converting the original rareness value to an alert percentile value,upon determining that the alert percentile value is greater than apercentile threshold, normalizing the alert, and publishing the alert.2. The method of claim 1, wherein the percentile threshold is${\xi_{i} = {1 - \frac{v_{i}}{N_{i}}}},$ where v_(i) is a maximumallowed alert counts for the alert type value and N_(i) is an estimatednext-day alert counts value.
 3. The method of claim 2, wherein thepercentile threshold is${\xi_{i} = {{\frac{N_{i} + p_{i} - {\left( {1 + \lambda} \right)v_{i}}}{N_{i}}\mspace{14mu} {if}\mspace{14mu} p_{i}} > v_{i}}},$where p_(i) is a number of published alerts for a current day and λ isan overage value.
 4. The method of claim 1, wherein normalizing thealert further comprises calculating a normalized rareness value${\eta_{i} = \frac{ɛ_{i} - \xi_{i}}{1 - \xi_{i}}},$ where ε_(i) is thealert percentile value and ξ_(i) is the percentile threshold.
 5. Themethod of claim 4, wherein the normalized rareness value is${\eta_{i} = {{1 - {{rand}\frac{m + {\sum_{k = 1}^{B}m_{k}}}{p + {\sum_{k = 1}^{B}p_{k}}}\mspace{14mu} {if}\mspace{14mu} r}} = 1}},$where m is a current number of alerts having an original rareness valuebeing 1 for a current day, p is a current number of published alerts fora current day, and r is the original rareness value.
 6. The method ofclaim 1, wherein publishing the alert further comprises calculating apublication rank value β=min(P,N_(p))*(1−η_(i)), where P is a maximumpublication value, N_(p) is a rank-renormalization constant, and η_(i)is a normalization rareness value.
 7. The method of claim 6, furthercomprising upon determining that the publication rank value is less thana maximum dispatch value, dispatching the alert.
 8. A computer-readablestorage medium storing instructions, which, when executed on aprocessor, performs an operation for normalizing and publishing an alertgenerated by a behavioral recognition system, the method comprising, theoperation comprising: receiving an alert having a type and an originalrareness value, converting the original rareness value to an alertpercentile value, upon determining that the alert percentile value isgreater than a percentile threshold, normalizing the alert, andpublishing the alert.
 9. The computer-readable storage medium of claim8, wherein the percentile threshold is${\xi_{i} = {1 - \frac{v_{i}}{N_{i}}}},$ where v_(i) is a maximumallowed alert counts for the alert type value and N_(i) is an estimatednext-day alert counts value.
 10. The computer-readable storage medium ofclaim 9, wherein the percentile threshold is${\xi_{i} = {{\frac{N_{i} + p_{i} - {\left( {1 + \lambda} \right)v_{i}}}{N_{i}}\mspace{14mu} {if}\mspace{14mu} p_{i}} > v_{i}}},$where p_(i) is a number of published alerts for a current day and λ isan overage value.
 11. The computer-readable storage medium of claim 8,wherein normalizing the alert further comprises calculating a normalizedrareness value ${\eta_{i} = \frac{ɛ_{i} - \xi_{i}}{1 - \xi_{i}}},$ whereε_(i) is the alert percentile value and ξ_(i) is the percentilethreshold.
 12. The computer-readable storage medium of claim 11, whereinthe normalized rareness value is${\eta_{i} = {{1 - {{rand}\frac{m + {\sum_{k = 1}^{B}m_{k}}}{p + {\sum_{k = 1}^{B}p_{k}}}\mspace{14mu} {if}\mspace{14mu} r}} = 1}},$where m is a current number of alerts having an original rareness valuebeing 1 for a current day, p is a current number of published alerts fora current day, and r is the original rareness value.
 13. Thecomputer-readable storage medium of claim 8, wherein publishing thealert further comprises calculating a publication rank valueβ=min(P,N_(p))*(1−η_(i)), where P is a maximum publication value, N_(p)is a rank-renormalization constant, and η_(i) is a normalizationrareness value.
 14. The computer-readable storage medium of claim 13,further comprising upon determining that the publication rank value isless than a maximum dispatch value, dispatching the alert.
 15. A systemcomprising: a processor and a memory for hosting an application, which,when executed on the processor, performs an operation for normalizingand publishing an alert generated by a behavioral recognition system,the operation comprising: receiving an alert having a type and anoriginal rareness value, converting the original rareness value to analert percentile value, upon determining that the alert percentile valueis greater than a percentile threshold, normalizing the alert, andpublishing the alert.
 16. The system of claim 15, wherein the percentilethreshold is ${\xi_{i} = {1 - \frac{v_{i}}{N_{i}}}},$ where v_(i) is amaximum allowed alert counts for the alert type value and N_(i) is anestimated next-day alert counts value.
 17. The system of claim 16,wherein the percentile threshold is${\xi_{i} = {{\frac{N_{i} + p_{i} - {\left( {1 + \lambda} \right)v_{i}}}{N_{i}}\mspace{14mu} {if}\mspace{14mu} p_{i}} > v_{i}}},$where p_(i) is a number of published alerts for a current day and λ isan overage value.
 18. The system of claim 15, wherein normalizing thealert further comprises calculating a normalized rareness value${\eta_{i} = \frac{ɛ_{i} - \xi_{i}}{1 - \xi_{i}}},$ where ε_(i) is thealert percentile value and ξ_(i) is the percentile threshold.
 19. Thesystem of claim 18, wherein the normalized rareness value is${\eta_{i} = {{1 - {{rand}\frac{m + {\sum_{k = 1}^{B}m_{k}}}{p + {\sum_{k = 1}^{B}p_{k}}}\mspace{14mu} {if}\mspace{14mu} r}} = 1}},$where m is a current number of alerts having an original rareness valuebeing 1 for a current day, p is a current number of published alerts fora current day, and r is the original rareness value.
 20. The system ofclaim 15, wherein publishing the alert further comprises calculating apublication rank value β=min(P,N_(p))*(1−η_(i)), where P is a maximumpublication value, N_(p) is a rank-renormalization constant, and η_(i)is a normalization rareness value.
 21. The system of claim 20, furthercomprising upon determining that the publication rank value is less thana maximum dispatch value, dispatching the alert.